尽管在文本,图像和视频上生成的对抗网络(GAN)取得了显着的成功,但由于一些独特的挑战,例如捕获不平衡数据中的依赖性,因此仍在开发中,生成高质量的表格数据仍在开发中,从而优化了合成患者数据的质量。保留隐私。在本文中,我们提出了DP-CGAN,这是一个由数据转换,采样,条件和网络培训组成的差异私有条件GAN框架,以生成现实且具有隐私性的表格数据。 DP-Cgans区分分类和连续变量,并将它们分别转换为潜在空间。然后,我们将条件矢量构建为附加输入,不仅在不平衡数据中介绍少数族裔类,还可以捕获变量之间的依赖性。我们将统计噪声注入DP-CGAN的网络训练过程中的梯度,以提供差异隐私保证。我们通过统计相似性,机器学习绩效和隐私测量值在三个公共数据集和两个现实世界中的个人健康数据集上使用最先进的生成模型广泛评估了我们的模型。我们证明,我们的模型优于其他可比模型,尤其是在捕获变量之间的依赖性时。最后,我们在合成数据生成中介绍了数据实用性与隐私之间的平衡,考虑到现实世界数据集的不同数据结构和特征,例如不平衡变量,异常分布和数据的稀疏性。
translated by 谷歌翻译
未知的非线性动力学通常会限制前馈控制的跟踪性能。本文的目的是开发一个可以使用通用函数近似器来补偿这些未知非线性动力学的前馈控制框架。前馈控制器被参数化为基于物理模型和神经网络的平行组合,在该组合中,两者都共享相同的线性自回旋(AR)动力学。该参数化允许通过Sanathanan-Koerner(SK)迭代进行有效的输出误差优化。在每个Sk-itteration中,神经网络的输出在基于物理模型的子空间中通过基于正交投影的正则化受到惩罚,从而使神经网络仅捕获未建模的动力学,从而产生可解释的模型。
translated by 谷歌翻译
我们考虑有限混合物(MFM)和Dirichlet工艺混合物(DPM)模型的贝叶斯混合物。最近的渐近理论已经确定,DPM高估了大型样本的聚类数量,并且两类模型的估计量对于不指定的群集的数量不一致,但是对有限样本分析的含义尚不清楚。拟合这些模型后的最终报告的估计通常是使用MCMC摘要技术获得的单个代表性聚类,但是尚不清楚这样的摘要估计簇的数量。在这里,我们通过模拟和对基因表达数据的应用进行了研究,发现(i)DPM甚至在有限样本中高估了簇数的数量,但仅在有限的程度上可以使用适当的摘要来纠正,并且(ii)(ii) )错误指定会导致对DPM和MFM中集群数量的高估,但是结果通常仍然可以解释。我们提供了有关MCMC摘要的建议,并建议尽管MFM的渐近性能更具吸引力,这提供了强大的动力来偏爱它们,但使用MFMS和DPMS获得的结果通常在实践中非常相似。
translated by 谷歌翻译
潜在的DIRICHLET分配(LDA)广泛用于一组文档的无监督主题建模。模型中没有使用时间信息。但是,连续令牌的相应主题之间通常存在关系。在本文中,我们向LDA提供了一个扩展,该扩展名使用马尔可夫链来建模时间信息。我们将这种新模型从语音发现进行声学单元发现。作为输入令牌,该模型从具有512个代码的矢量定量(VQ)神经网络中对语音进行了离散的编码。然后,目标是将这512个VQ代码映射到50个类似电话的单元(主题),以使其更加类似于真实的电话。与基本LDA相反,该基础LDA仅考虑VQ代码在发声中的共同发生(文档),Markov链LDA还捕获了连续代码如何相互跟随。与基本LDA相比,这种扩展会导致集群质量和电话分割结果的提高。与最近学习50个单元的媒介量化神经网络方法相比,扩展的LDA模型在电话分割方面的性能较好,但在相互信息中的性能较差。
translated by 谷歌翻译
了解深度神经网络的结果是朝着更广泛接受深度学习算法的重要步骤。许多方法解决了解释人工神经网络的问题,但通常提供不同的解释。此外,不同的解释方法的超级公路可能导致互相冲突。在本文中,我们提出了一种使用受限制的Boltzmann机器(RBMS)来聚合不同解释算法的特征归属的技术,以实现对深神经网络的更可靠和坚固的解释。关于现实世界数据集的几个具有挑战性的实验表明,所提出的RBM方法优于流行的特征归因方法和基本集合技术。
translated by 谷歌翻译
Data-driven models such as neural networks are being applied more and more to safety-critical applications, such as the modeling and control of cyber-physical systems. Despite the flexibility of the approach, there are still concerns about the safety of these models in this context, as well as the need for large amounts of potentially expensive data. In particular, when long-term predictions are needed or frequent measurements are not available, the open-loop stability of the model becomes important. However, it is difficult to make such guarantees for complex black-box models such as neural networks, and prior work has shown that model stability is indeed an issue. In this work, we consider an aluminum extraction process where measurements of the internal state of the reactor are time-consuming and expensive. We model the process using neural networks and investigate the role of including skip connections in the network architecture as well as using l1 regularization to induce sparse connection weights. We demonstrate that these measures can greatly improve both the accuracy and the stability of the models for datasets of varying sizes.
translated by 谷歌翻译
We test the performance of GAN models for lip-synchronization. For this, we reimplement LipGAN in Pytorch, train it on the dataset GRID and compare it to our own variation, L1WGAN-GP, adapted to the LipGAN architecture and also trained on GRID.
translated by 谷歌翻译
High content imaging assays can capture rich phenotypic response data for large sets of compound treatments, aiding in the characterization and discovery of novel drugs. However, extracting representative features from high content images that can capture subtle nuances in phenotypes remains challenging. The lack of high-quality labels makes it difficult to achieve satisfactory results with supervised deep learning. Self-Supervised learning methods, which learn from automatically generated labels has shown great success on natural images, offer an attractive alternative also to microscopy images. However, we find that self-supervised learning techniques underperform on high content imaging assays. One challenge is the undesirable domain shifts present in the data known as batch effects, which may be caused by biological noise or uncontrolled experimental conditions. To this end, we introduce Cross-Domain Consistency Learning (CDCL), a novel approach that is able to learn in the presence of batch effects. CDCL enforces the learning of biological similarities while disregarding undesirable batch-specific signals, which leads to more useful and versatile representations. These features are organised according to their morphological changes and are more useful for downstream tasks - such as distinguishing treatments and mode of action.
translated by 谷歌翻译
Objective: Imbalances of the electrolyte concentration levels in the body can lead to catastrophic consequences, but accurate and accessible measurements could improve patient outcomes. While blood tests provide accurate measurements, they are invasive and the laboratory analysis can be slow or inaccessible. In contrast, an electrocardiogram (ECG) is a widely adopted tool which is quick and simple to acquire. However, the problem of estimating continuous electrolyte concentrations directly from ECGs is not well-studied. We therefore investigate if regression methods can be used for accurate ECG-based prediction of electrolyte concentrations. Methods: We explore the use of deep neural networks (DNNs) for this task. We analyze the regression performance across four electrolytes, utilizing a novel dataset containing over 290000 ECGs. For improved understanding, we also study the full spectrum from continuous predictions to binary classification of extreme concentration levels. To enhance clinical usefulness, we finally extend to a probabilistic regression approach and evaluate different uncertainty estimates. Results: We find that the performance varies significantly between different electrolytes, which is clinically justified in the interplay of electrolytes and their manifestation in the ECG. We also compare the regression accuracy with that of traditional machine learning models, demonstrating superior performance of DNNs. Conclusion: Discretization can lead to good classification performance, but does not help solve the original problem of predicting continuous concentration levels. While probabilistic regression demonstrates potential practical usefulness, the uncertainty estimates are not particularly well-calibrated. Significance: Our study is a first step towards accurate and reliable ECG-based prediction of electrolyte concentration levels.
translated by 谷歌翻译
As spatial audio is enjoying a surge in popularity, data-driven machine learning techniques that have been proven successful in other domains are increasingly used to process head-related transfer function measurements. However, these techniques require much data, whereas the existing datasets are ranging from tens to the low hundreds of datapoints. It therefore becomes attractive to combine multiple of these datasets, although they are measured under different conditions. In this paper, we first establish the common ground between a number of datasets, then we investigate potential pitfalls of mixing datasets. We perform a simple experiment to test the relevance of the remaining differences between datasets when applying machine learning techniques. Finally, we pinpoint the most relevant differences.
translated by 谷歌翻译